Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
نویسندگان
چکیده
Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have program function and, hence, there is opportunity make visible -- show function's code agent so it can exploit internal structure learn policies a more sample efficient manner. this paper, we how accomplish idea two steps. First, propose machines, type of finite state machine that supports specification while exposing structure. We then describe different methodologies support learning, including automated shaping, task decomposition, counterfactual reasoning off-policy learning. Experiments on tabular continuous domains, across tasks agents, benefits exploiting respect efficiency quality resultant Finally, by virtue being form machine, machines expressive power regular language such loops, sequences conditionals, well expression temporally extended properties typical linear temporal logic non-Markovian specification.
منابع مشابه
Reward, Motivation, and Reinforcement Learning
There is substantial evidence that dopamine is involved in reward learning and appetitive conditioning. However, the major reinforcement learning-based theoretical models of classical conditioning (crudely, prediction learning) are actually based on rules designed to explain instrumental conditioning (action learning). Extensive anatomical, pharmacological, and psychological data, particularly ...
متن کاملCompatible Reward Inverse Reinforcement Learning
PROBLEM • Inverse Reinforcement Learning (IRL) problem: recover a reward function explaining a set of expert’s demonstrations. • Advantages of IRL over Behavioral Cloning (BC): – Transferability of the reward. • Issues with some IRL methods: – How to build the features for the reward function? – How to select a reward function among all the optimal ones? – What if no access to the environment? ...
متن کاملAn Average - Reward Reinforcement Learning
Recently, there has been growing interest in average-reward reinforcement learning (ARL), an undiscounted optimality framework that is applicable to many diierent control tasks. ARL seeks to compute gain-optimal control policies that maximize the expected payoo per step. However, gain-optimality has some intrinsic limitations as an optimality criterion, since for example, it cannot distinguish ...
متن کاملHierarchical Average Reward Reinforcement Learning
Hierarchical reinforcement learning (HRL) is the study of mechanisms for exploiting the structure of tasks in order to learn more quickly. By decomposing tasks into subtasks, fully or partially specified subtask solutions can be reused in solving tasks at higher levels of abstraction. The theory of semi-Markov decision processes provides a theoretical basis for HRL. Several variant representati...
متن کاملMaximum reward reinforcement learning: A non-cumulative reward criterion
Existing reinforcement learning paradigms proposed in the literature are guided by two performance criteria; namely: the expected cumulativereward, and the average reward criteria. Both of these criteria assume an inherently present cumulative or additivity of the rewards. However, such inherent cumulative of the rewards is not a definite necessity in some contexts. Two possible scenarios are p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Artificial Intelligence Research
سال: 2022
ISSN: ['1076-9757', '1943-5037']
DOI: https://doi.org/10.1613/jair.1.12440